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ABSTRACT 

This  paper  has  two  aims;  to  exhibit  very  general  conditions  under  which 
members  of  a  broad  class  of  unconstrained  minimization  algorithms  are  globally 
convergent  in  a  strong  sense,  and  to  propose  several  new  algorithms  that  use 
second  derivative  information  and  achieve  such  convergence.  In  the  first  part  of 
the  paper  we  present  a  general  trust  region  based  algorithm  schema  that 
includes  an  undefined  step  selection  strategy.  We  give  general  conditions  on  this 
step  selection  strategy  under  which  limit  points  of  the  algorithm  will  satisfy  first 
and  second  order  necessary  conditions  for  unconstrained  minimization.  Our 
algorithm  schema  is  sufficiently  broad  to  include  line  search  algorithms  as  well. 
Next,  we  show  that  a  wide  range  of  step  selection  strategies  satisfy  the  require¬ 
ments  of  our  convergence  theory.  This  leads  us  to  propose  several  new  algo¬ 
rithms  that  use  second  derivative  information  and  achieve  strong  global  conver¬ 
gence,  including  an  indefinite  line  search  algorithm,  several  indefinite  dogleg 
algorithms,  and  a  modified  "optimal-step”  algorithm.  Finally,  we  propose  an 
implementation  of  one  such  indefinite  dogleg  algorithm. 
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1.  InLroaac'vica 

In  this  paper  vie  discuss  the  convergence  properties  of  a  broad  class  of  algo¬ 
rithms  for  the  unconstrained  minimization  problem 

min/ (z):  /?"-»/?  (1.1) 

zc  Fn 

where  it  is  assumed  that  f  is  twice  continuously  differentiable.  The  algorithms 
discussed  are  of  the  trust  region  type,  but  the  algorithm  schema  used  is 
sufficiently  general  that  our  convergence  results  apply  to  many  algorithms  of 
the  line  search  type  as  well. 

In  the  first  part  of  the  paper  we  give  a  general  condition  under  which  the 
limit  points  of  a  broad  class  of  trust  region  algorithms  satisfy  the  first  order 
necessary  conditions  for  Problem  1.1.  In  this  paper  we  shall  call  such  an  algo¬ 
rithm  "first  order  stationary  point  convergent".  At  the  same  time,  we  give  a 
general  condition  that  shows  how  the  limit  points  of  these  algorithms  may 
satisfy  the  second  order  necessary  conditions  for  1.1  by  incorporating  second 
order  information.  We  shall  refer  to  such  an  algorithm  as  "second  order  station¬ 
ary  point  convergent". 

In  the  second  part  of  the  paper,  we  show  that  many  algorithms  satisfy  these 
conditions  for  first  and  second  order  stationary  point  convergence,  and  we  sug¬ 
gest  several  new  algorithms  that  use  second  order  information. 

The  convergence  results  presented  here  are  a  generalization  of  those  given 
by  Sorensen  [1980],  Sorensen  proves  strong  convergence  properties  for  a 
specific  trust  region  algorithm,  which  uses  second  order  information.  Others, 
including  Fletcher  and  Freeman  [1977],  Goldfarb  [1960],  Kaniel  and  Dax  [1979], 
McCormick  [1977],  More  and  Sorensen  [1279],  Mukai  and  Polak  [1978],  and  Vial 
and  Zang  [1975],  have  discussed  and  proven  the  second  order  stationary  point 
convergence  of  algorithms  that  use  second  order  information  but  are  not  of  the 
trust  region  type.  Powell  [1975],  on  the  other  hand,  discusses  the  first  order 
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stationary  point  convergence  properties  of  a  class  of  trust  region  algorithms. 

In  Section  2  we  define  our  general  algorithm  schema,  state  the  conditions 

for  the  types  of  convergence  mentioned  above,  and  prove  the  convergence 

results.  In  Section  3  we  take  the  first  step  toward  showing  the  applicability  of 

the  class  of  algorithms  by  commenting  that  practically  all  trust  radius  adjusting 

strategies  in  use  fit  into  our  algorithm  schema.  In  Sections  4  and  5  we  further 
» 

show  the  meaning  of  the  schema  by  discussing  a  variety  of  different  types  of 
step  selection  strategies  that  satisfy  the  conditions  given  in  Section  2.  Finally  in 
Section  6  we  propose  an  implementation  of  one  of  these,  an  "indefinite  dogleg” 
algorithm. 

In  the  remainder  of  the  paper  we  use  the  following  notation; 
j  |  •  !  |  is  the  Euclidean  norm. 
g  [x)sRn  is  the  gradient  of  f  evaluated  at  x. 

H{x)sRn*n  is  the  Hessian  of  f  evaluated  at  x. 

\xk  I  is  a  sequence  of  points  generated  by  an  algorithm,  and  fk  =/  ( xk ).  gk  =g  (zfc), 
and  Hk=H(xk). 

X\(B)  and  Xn(2?)  are  the  smallest  and  largest  eigenvalues,  respectively,  of  the 
symmetric  matrix  B. 

[ut . Un]  is  the  subspace  of  Rn  spanned  by  the  vectors  •u1,...,um. 
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3.  Giobai  -Jc  avcrgencs  .  J  a.  Geaefal  Trust  Region  Algorithm. 

In  this  section  we  describe  a  class  of  trust  region  algorithms  in  a  way  that 
includes  most  trust  region  algorithms  as  well  as  many  other  algorithms,  and 
that  isolates  the  conditions  they  ritay  meet  in  order  to  have  various  convergence 
properties. 

The  form  of  most  existing  trust  region  algorithms  is  basically  as  follows. 
The  algorithm  generates  a  sequence  of  points  xk  .  At  the  k-th  iteration,  it  forms 
a  quadratic  model  of  the  objective  function  about  xk , 

i'ic(uj)=fic+9kU>+  %™TBkw  , 

where  -wsRn  and  Bk£R~l*n  is  some  symmetric  matrix,  and  finds  an  initial  value 
for  the  trust  radius,  A*.  Then  a  "minor  iteration"  is  performed,  possibly  repeat¬ 
ed!  3  .  The  minor  iteration  consists  of  using  the  current  trust  radius  A*  and  the 
information  contained  in  the  quadratic  model  to  compute  a  step 

Pk(bk)=p(gk,BkAk) 

and  then  comparing  the  actual  reduction  of  the  objective  function 

aredk  (A* )  =fk  -/  {xk  +pk  (A*  )) 
to  the  reduction  predicted  by. the  quadratic  model 

pnsdk  (Afc )  =fk  -i>k  (p*  (Afc )). 

If  the  reduction  is  satisfactory,  then  the  step  can  be  taken,  or  a  larger  trust 
region  tried.  Otherwise  the  trust  region  is  reduced  and  the  minor  iteration  is 
repeated. 

Three  aspects  of  this  algorithm  are  unspecified,  namely  how  to  form  the 
matrix  Bk  for  the  quadratic  model,  how  the  step  computing  function  p(p  ,5, A)  is 
performed  on  each  minor  iteration,  and  how  the  trust  radius  Afc  is  adjusted.  In 
our  abstract  definition  of  a  trust  region  algorithm  below,  the  minor  iterations 
and  the  strategy  for  adjusting  the  trust  region  are  replaced  by  a  condition  that 
the  step  and  trust  radius  must  satisfy  upon  quitting  the  major  iteration.  This 


allows  the  description  to  cover  a  wide  variety  of  trust  region  strategies.  The 
methods  of  computing  Bk  and  p  {g  ,8, A)  are  left  unspecified,  since  we  later  want 
to  give  conditions  on  these  quantities  that  ensure  the  convergence  properties. 
For  our  abstract  definition  of  a  trust  region  algorithm  it  is  enough  to  know  that 
they  are  computed  in  such  a  way  that  the  algorithm  is  well-defined. 

We  now  define  the  general  trust  region  algorithm: 


Algorithm  2.1 

0)  Given  7i.  r?!,  r7a  e(0, 1),  and 


Ac>0,  k-1. 

1)  Compute  9k~9(xk)-  symmetric  BkcRn*n. 

2)  Find  A*  and  compute pk=pk{ At)  satisfying: 

tlP*  ll«4  and 
^  aredk(Ak)  __ 

b)  either  Afc^Afc_!  or 

for  some  -^-Afc , 

7i 

aredti A)  are^^^A) 

Pred*( A)  Vz  °r  pre4_i(A)  Vz‘ 

3)  xk  +  l-xk+pk,  k=k  +  l. 

4)  Goto  1). 


Again,  note  that  the  computations  of  Bk,  pk( A),  and  A*  are  left  unspecified. 
In  Theorem  2.2  we  give  conditions  on  Bk  and  p  {g  ,B ,A)  that  yield  various  conver¬ 
gence  properties.  In  Section  3  we  will  discuss  a  number  of  trust  radius  adjusting 
strategies  that  satisfy  the  requirements  in  Algorithm  2.1,  step  2). 


Now  wc  set  forth  conditions  which  the  step  computing  function  p(g,B,A) 
may  satisfy  and  prove  that  if  it  does  meet  these  conditions  then  the  conver- 


gence  results  follow,  in  Sections  4  and  o  we  will  discuss  various  step  computing 
a’gciiihms  that  fulfill  the  conditions  below. 

The  first  condition  says  that  the  step  must  give  sufficient  decrease  of  the 
quadratic  model.  The  second  condition  requires  that  when  H{x)  is  indefinite  the 
step  give  as  good  a  decrease  of  the  quadratic  model  as  a  direction  of  sufficient 
negative  curvature.  The  third  condition  simply  says  that  if  the  Hessian  is  posi¬ 
tive  definite  and  the  Newton  step  lies  within  the  trust  region,  then  the  Newton 
s'  up  is  chosen. 

Before  stating  the  conditions  we  define  some  additional  notation. 

pred(g  ,BA)  =  -gTp(g,B,b)-%  p(g  ,B .A)7 B p(g ,B ,A). 

Cur  conditions  that  a  step  selection  strategy  may  satisfy  are: 

Condition  #1 

There  are  Cj,  ^>0  such  that  for  all  gsRn,  for  all  symmetric  BeRn*n ,  and  for  all 
A>0,  pnred{g.B.b)>cy  \\g 

Condition  #2 

There  is  a  c2>0  such  that  for  all  geRn,  for  all  symmetric  BeRnxn,  and  for  all 
a> 0,  pred(g  ,B,A)ac2(--\i(ff))A2. 

Condition  #3 

If  B  is  positive  definite  and  \\-B~xg  j|sA,  thenpig  ,B ,&)=-B~lg . 

Vie  now  state  and  prove  the  convergence  theorem.  The  proofs  are  similar  to 
those  of  Sorensen  [1980].  Conditions  and  #3  constitute  a  major  generali¬ 

zation  of  his  assumption  that 


p(g.B  A)=argmin  j  gru>+wrBw  :  ||u>  ;jstA  { 
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Theorem  2.2 

Let  /:  Rn-*R  be  twice  continuously  differentiable  and  bounded  below,  and  let 
H{x)  satisfy  |  j  H{x)  j \<@l  for  all  xtRn .  Suppose  that  an  algorithm  satisfying  the 
conditions  of  Algorithm  2.1  is  applied  to  /(x),  starting  from  some  XitRn,  gen¬ 
erating  a  sequence  {x*J,  xkiRn ,  k  =  1,2 .  Then: 

J-  If  _p(p,,F,A)  satisfies  Condition  #i  and  \\Bk  ||s/?2  for  all  k,  then  gk  converges 
to  0  (first  order  stationary  point  convergence). 

II.  If  p{g,B,A)  satisfies  Conditions  #1  and  #3,  Bk=H(xk)  for  all  k,  H(x)  is 
Lipschitz  continuous  with  constant  L,  and  x»  is  a  limit  point  of  j xk\  with  //(x.) 
positive  definite,  then  xk  converges  q-quadratically  to  x.. 

III.  If  p{g  ,B ,A)  satisfies  Conditions  #1  and  #2,  Bk=H(xk)  for  all  k,  H[x)  is  uni¬ 
formly  continuous,  and  xk  converges  to  x.,  then  H{x.)  is  positive  semi-definite 
(second  order  stationary  point  convergence,  with  I.). 


Proof. 


Each  of  the  proofs  of  I,  II,  and  III  use  the  following  fact: 

Lemma  If  there  is  a  positive  integer  M  and  a  function  w  (A)  such  that 

1)  limtu(A)=0, 

A-»0+ 

2)  for  all  A>0,  for  all  k^M, 


,ared*(A)  .  ,  . 

v«4( A) 

3)  each  A*  satisfies  the  trust  radius  requirement  in  step  2b)  of  Algorithm  2.1, 
then  {A*  j  is  bounded  away  from  0. 

Proof  of  the  lemma:  By  1)  and  2),  there  is  a  A>0  such  that  if  0<A<A  and  /fcfeAf, 
aredk  (A) 

thcn  predk{ A)~^z  ^us,  f°r  if  A/t<A*_1,  then  by  3)  there  must  be  some 


A<  — A*  which  either  has 
7i 


aredk  (A) 
predk( A)  12  °r 


aredk.x{A) 
predk.  i(A) 


A^A,  so  At&yjA^yjA.  Hence,  for  k^M  +  l,  Ak^min(Ak 


<ri  :  But  that  means  that 
-j,7iA).  so  clearly  |Atj  is 


I 


B 

buL.,*u..Q  ci^'&y  ifdTl  L. 

Each  of  the  three  parts  also  uses  the  following: 

By  laylor's  theorem,  for  any  k  ana  any  A>0, 

|  arcdit  (A)  -pre dk  (A)  j 

=  !/*-/  \xk  (A ))-(/*  -/*  -glpk (A)-  (A )T  Bkpk (A))  | 

i 

=  I  ftp*  &kPk  {h)-fpic (A )rH{xk  +&>* (A ))/>* (A)(l -£)d{ ! 

0 

1 

^  !  IP* (A)  !  i2  /  I  i  Bk  -H{xk  +&>*( A))  ]  | 
o 

So, 

I  gred/fc(A)  , 

'pre4(A) 

J 

llpfc(A)  i!2/ ||5*-//(xfc+ &k (A))  ||(1  -Odf 

“  | pre  4(A)!  '• 

All  three  parts  proceed  by  using  the  relevant  hypotheses  and  the  above  argu¬ 
ment  to  boundpre£f*(A)  below  by  a  term  that  is  0{Cf),  and  then  using  the  lemma 

above. 

Proof  of  I:  Consider  any  m  with  \\gm  | [ *0. 

For  any  x,  \\g{x)-gm  !!*£/?,  I|z-xm  ||.  so  if  \\x-xm  \  |< ■ .  then 

li 9{*)  ii> \\9m  IMiffOO-Jm  HiJJ&iL. 

Call  =  and  5/?=^  :  \\x-xm  j|  <i?{. 

dpi 

Now,  there  are  two  possibilities.  Either  for  all  k>.m,  xkzBx,  or  eventually 
( x,.{  leaves  the  ball  Br.  It  turns  out  that  the  sequence  can  not  stay  in  the  ball. 

If  xkzBr  for  all  k^m,  then  for  ail  k^m,  \\gk  j  j  > — ■ ,  which  we  shall  call  e. 

Thus,  by  Condition  #1, 


i.-.  i frSrr**  M  nM 
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predk  (A)s*a  |  !p*  !  | min(A,  --4.'  '.]■) 

1 1  ak  i  i 

&aemin(A,  —) 

Pi 

for  all  kezm.,  where  CTscjcr,  is  used  to  simplify  the  notation.  So, 

I  arggk(A)  ,  • 

predk{L)  " 

1 

A2  /  i !  Bk -H{xk +£p* (A))  1 1(1  -£)d£ 

- - - 

c£mln(A,  ■—) 

Pi 

.  A2(li+ft>) 

A 

o-£mln(A,  -r— ) 

Pi 

ere 

for  all  fc>m  and  A£  — :  Applying  the  lemma  with  in  (A)  = - ,  and  M-m,  we 

Pg  (7E 

see  that  {A*}  is  bounded  away  from  0.  But,  since 

/*  ~fk  *.i=aredk  (Afc^iFred*  (A*) 
stTjjffemi^Afc.^-). 

and  f  is  bounded  below,  At  converges  to  0,  which  is  a  contradiction.  Hence, 
eventually  \xk  j  must  be  outside  Br  for  some  k>m. 

Let  i  +  1  be  the  first  index  after  m  with  xJ+i  not  in  Br.  Then 

/  +  (xm)~  £  f  (xk  +  i)-f  (zk) 

k  =m 

&  £  r)jrrsdk{hk)^  £  min  (A*,  j-) 
k=m  k*m  Pi 

Zrjiaemini  £  A*,  (i-m)  —-) 
k=m  Pi 

arr7,<7emin(  ^  i:P*(4)  II. 

k  =m  Pi 
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^T}i<x£min(#,(l  -m) 

P2 


1 1 9m  1 1  .  / 

=Vi<J — ^ - rmn(- 


i  9m 


3/Si 


( l—m ) 


llgm  lk 

2/?z  * 


^igrnll^ftnin^.jS. 

Now,  since  f  is  bounded  below  and  $/(xfc)j  is  monotonically  decreasing,  {/(xfc)j 
converges  to  some  limit,  say  /.  .  Then  by  the  above,  for  any  k 

\\9kWzAr)\  jr-min  {~~))'l{f{xk)-f.). 

Thus  sine e \f  {xk)\-*f jlsr*j|-0. 

Proof  of  II:  By  assumption,  x.  is  a  limit  point,  say  converges  to  x».  We 
will  show  first  that  in  fact,  if  H(x»)  is  positive  definite,  then  xk  converges  to  x.. 
By  I.  y(x#)=0.  Since  H{x.)  is  positive  definite  and  H  is  continuous,  we  can  find 
dj>0  such  that  if  Ijx-x.  1 1 <6t.  then  H{x)  is  positive  definite,  and  if  x*x.  then 
p(x}>*0.  Call  Bx=\x  :  | |x-x.  | j <<5,1. 


£ 

Since  g(x.)=0,  we  can  find  <5a>0,  with  \\H(x)~lg(x)  ||<-^-  for  all 

Ki 

xiBz-\x  :  |jx-x.  j | <d2J.  Also,  take  <52<~. 

Find  jo  such  that  /  (x*  )<inf  \f  (x)  :  xeBi~Bz  j,  and  xk.  eBz.  Consider  any 

Jo  * o 

x<,  with  xlsBz.  We  claim  that  x<+1ei?2  which  implies  that  the  entire 

sequence  beyond  xkj ^  is  in  Bz.  If  xi+1  is  not  in  Bz,  then  since  ,  x1+i  is  not 

in  Bi,  either,  so 

4=  Il*i4.i-*k  11^  i l*»+i-**  ||-||x,-x.  ||*g,-~-s  |-<5, 

il£(*i)“‘g (zj)  II- 

Eet,  since  the  Newton  step  from  xL  is  within  the  trust  region,  by  Condition  #3, 
Pi( Af)=-//(x1)"1a(xi).  But  then  since  j !j>*(At)  1 1 «51(  xl+1sBlt  which  is  a  contrad- 


! 


Thus  for  all  ks:kio,  xkEBz,  and  so  since  f  {xk)  is  a  strictly  decreasing 
sequence  and  x.  is  the  unique  minimizer  of  /  in  Bz,  we  have  that  xk  converges 
to  z*. 

Now,  to  show  that  the  convergence  rate  is  quadratic,  we  show  that  (A*{  is 
bounded  away  from  0,  which  gives  the  result,  since  \\B(xk)~lg  {xk)  ||  converges 
to  0,  so  eventually,  by  Condition  #3.  the  Newton  step  will  always  be  taken.  Then 
by  a  usual  theorem  the  Lipschitz  continuity  of  H  implies  the  quadratic  conver¬ 
gence  rate. 

To  show  that  |A*  j  is  bounded  away  from  0,  we  will  again  use  the  lemma.  In 
order  to  do  so,  we  need  the  appropriate  lower  bound  onpred*(A). 


j  y  Condition  #1, 


predk(b)*o  \  \gt  |  j min(A,  jj  -)^J  1 10*  ||min(  ||p*(A)  1 1 .  'jyfrjp)' 

and  for  all  k  large  enough,  Bk-H{xk )  is  positive  definite,  so  either  the  Newton 
step  is  longer  than  the  trust  radius,  or  pk{ A)  is  the  Newton  step.  In  either  case, 

pk{^\\-Bk-'gk\\*\\Bk'\\  IlftH.so  Hg*  Thus, 

predk  (A )&o  |  j pk  (A)  1 1  min(  |  jp*  (A)  1 1 . 

-o ! \Pk (A)  1 1 2min(l,  ^-1—-). 


Nov;  call  c.=  Yi  7nm(l,  ■ 


1 


|//(z.)-lli  \\H{x.) 
is  an  M  such  that  for  k^M, 


min  ( 1 ,  ■ 


1 


\Bk 


-i 


I  Bk 


-)  s  c< 


- .  and  note  that  by  continuity  there 
Bk  is  positive  definite  and 


Finally,  note  that  by  the  argument  given  earlier  and  Lipschitz  continuity. 


12 


•  arsdk  {£)~predk  (A)  ^  j  \pk  (A)  |  j 3 
thus  for  any  A>0  and  k>M , 

aredk (A)  _  i  iP*=(A)  ll3^~ 
predk (A)  1 1  "  ac .  ( \pk  (A)  |  ;z 

L  II Ft  (A)  ;!  ^  Lb 

Zac.  2 ac.‘ 

so  by  applying  the  lemma  with  ™(A)=^m  we  have  that  jAtel  is  bounded  away 
from  0  and  we  are  done. 

Proof  of  IH  Suppose  to  the  contrary  that  \i(H(x.))<0.  By  the  umform  con¬ 
tinuity  of  H.  for  any  A>0,  and  any  k , 

crgdfc(A)  ,  1  l?>fc (A)  H^tg(A) 

'  predk  (A)  '  predk  (A) 

where 

w ( A)  =/  !  | H(xk+(Pk (A)) -fi(xk)\\{l-$)di 

0 

and  thus  limio(A)=0. 

A-0+ 

Find  M  such  that  if  *:>/,/.  \i{Bk)<^^~~<0-  By  Condition  #2,  for  all 
k^M,  and  for  all  A>0, 

predk  (A)fec2(-\i(5*))Azfcff2(“Xi(H(x.))/  2)A*. 
so  since  |  ipfc(<5)  1 1  «5.  the  lemma  applies  with 

Thus,  {A*  J  is  bounded  away  from  0. 

But, 


are dk  ( A* )st7j  jpre dk  ( A*  )fec  2 ( {H(x .))/  2) A|. 
and  since  /  is  bounded  below  areak(Afc)  converges  to  0,  so  A*  converges  to  0, 

which  is  a  contradiction.  Hence,  0.  This  concludes  the  proof  of 
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'in'; orem  2.2. 


The  results  of  this  theorem  also  apply  to  different  shapes  of  trust  region. 
Specifically  we  may  wish  to  use  a  trust  region  defined  by  j| D^p  ||sA  for  some 
non-singular  square  matrix  Dk  such  that  \\Dk  ||  and  ||Z?fc-1ll  are  uniformly 
bounded  in  k.  This  satisfies  the  conditions  of  Algorithm  2.1  and  Theorem  2.2 
since  if  we  make  a  change  of  variables  replacing  A  by  A  times  the  upper  bound 
c:i  \‘Df:  ij  then  Up*  j'fiA.  and  the  conditions  otherwise  do  not  involve  |jp  jj. 

.e  conditions  are  also  not  restricted  to  Euclidean  norm  and  Theorem  2.2 
applies  as  well  to  rectangular  trust  regions. 


i 


| 

j 

i 

4 

I 


ii 


3.  Some  Permissible  Trust  Region  Updating  Strategics 


The  conditions  on  the  trust  region  radius  A*  that  we  gave  in  step  £  of  Algo¬ 
rithm  2.1  were  chosen  to  be  near  minimal  conditions  that  allow  us  to  prove  the 
results  of  Theorem  2.2.  Obviously  in  implementing  an  algorithm  lnvoiv.ng  tru^t 
regions,  there  are  many  detailed  considerations  in  choosing  and  adjusting  the 
trust  region  radius  that  we  have  not  considered  so  far  in  this  paper.  Our  pur¬ 
pose  in  Algorithm  2.1  was  to  set  forth  conditions  that  apply  to  almost  any  rea¬ 
sonable  strategy.  Here  we  indicate  more  specifically  what  types  of  strategies 
are  covered. 

Most  approaches  for  choosing  and  adjusting  the  radius  A*  follow  the  follow¬ 
ing  general  pattern.  Iteration  k  of  the  algorithm  begins  with  an  initial  trust 
radius  which  defines  a  step  p.  If  this  step  is  unsatisfactory  a  sequence  cf  smaller 
radii  are  tried  until  a  satisfactory  one  is  found.  If  the  step  p  is  satisfactory  it 
may  be  used  or  a  larger  trial  trust  region  radius  tried.  At  the  next  iterate 
xk  +  \~xk  +Pt  and  a  new  initial  trust  radius  is  generated. 

To  choose  the  initial  trial  radius  at  the  k-th  iteration.  Algorithm  2  1  only 
requires  that  two  conditions  be  met.  First,  the  initial  trial  r  iius  can  be  smaller 
than  the  final  radius  used  for  the  previous  step  only  if  the  previous  step  failed 
the  sufficient  decrease  condition,  i.e. 

°rgdjfc-l(A>-l?  . 
pre<4-i( Afc-i)  <??a' 

Second,  in  this  case  the  ratio  between  the  previous  A*_i  and  the  new  trial  radius 
must  be  bounded  by  some  constant  that  is  fixed  for  the  entire  algorithm.  These 
possibilities  are  covered  by  the  condition  b)  in  step  2)  of  Algorithm  2.1.  Algo¬ 
rithm  2.1  allows  the  possibility  of  making  the  initial  trial  radius  larger  than  A*-, 
by  any  method  chosen,  if  that  seems  advantageous.  Clearly  some  methods  for 
doing  this  could  be  very  inefficient,  but  from  the  point  oi  view  of  global  conver¬ 
gence  any  increase  is  allowable. 
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One  method  Jo.'  choosing  the  initial  trial  trust  region  at  the  k-th  iteration 
vinca  Algorithm  2.1  does  not  cover  is  basing  the  radius  on  the  length  of  the  pre¬ 
vious  step  pk- 1  even  \ea  pk-i  falls  in  the  interior  of  the  trust  region  £*-»•  We 
s„.;  Lttle  justification  for  this  strategy,  and  including  it  in  our  theory,  if  possible, 
vculd  make  the  analysis  more  cumbersome. 

Given  the  initial  trial  radius  at  the  k-th  iteration,  a  sequence  of  trial  radii 
may  be  tried  until  a  satisfactory  one  is  found.  Algorithm  2.1  only  requires  that 
the  trial  radius  be  reduced  when  the  previous  trial  step  fails  to  satisfy  the  condi- 
t.on  a)  in  step  2)  of  Algorithm  2.1  and  only  in  this  case,  and  that  the  reduction 
be  bounded  below  by  a  constant  that  is  fixed  for  the  entire  algorithm.  This  case 
is  covered  by  the  condition 

As  — A* 

7i 

and 


are  cL  (A) 
predkiH)  <7)i 

in  Algorithm  2.1.  Of  course,  the  trust  region  ultimately  used  must  satisfy  this 
condition. 

The  conditions  of  Algorithm  2.1  also  allow  successively  larger  trial  trust 
regions  to  be  tried  within  the  k-th  iteration  whenever  this  seems  advantageous. 
There  is  no  restriction  on  the  method  used  to  increase  the  trial  radius,  nor  on 
the  amount  of  the  increase,  as  long  as  the  final  one  used  satisfies  condition  a)  of 
step  2)  in  Algorithm  2.1.  Notice  that  it  is  not  necessary  to  increase  the  trust 
region  at  any  point.  Never  increasing  the  trust  region  may  cause  great 
inefficiency,  but  convergence  is  still  assured. 


4.  Some  Permissible  Seep  Selection  Strategies 

In  this  section  we  present  three  lemmas  describing  useful  conditions  under 
which  the  step  pfc( A)  in  Algorithm  2.1  will  satisfy  conditions  #1  and  #2.  Using 
these  lemmas  we  will  see  that  a  number  of  different  methods  for  computing 
steps  yield  first  and  second  order  stationary  point  convergent  trust  region  type 
algorithms. 

First  Let  us  mention  two  types  of  step  selection  strategies  that  have  been 
used  in  trust  region  algorithms  to  which  we  will  refer. 

The  "optimal"  trust  region  step  selection  strategy  is  to  take 

Pk(&k)=ar9min\fk+gi['w+%™TBkw'-  (4.1) 

This  strategy  has  been  discussed  and  used  by  many  authors,  see  e.g.  Hebden 
[1973],  More  [1978],  Sorensen  [1980],  and  Gay  [1981].  Bk  is  positive  definite  and 
1 1  -Bkl9k  thenpfc  =  -i?fc“1gfc  is  the  solution  to  (4.1).  Otherwise.  pfc  satisfies 

{Bk+OLkI)Pk--9k  ■  f°r  some  non-negative  a*  such  that  {Bk+akI)  is  at  least  posi¬ 
tive  semi-definite  and  | \pk  ||=Afc.  If  Bk  is  positive  definite,  then  so  is  (Bk+akf) 
and 

Pk=-{Bk+*kI)~1gk  ■  (4  2) 

where  a*  is  uniquely  determined  by  |jpfc  j  j  =Afc .  If  Bk  has  a  negative  eigenvalue, 
then  Pk  is  still  of  the  form  (4.2)  unless  gk  is  orthogonal  to  the  null  space  of 
(flfc-Aj/)  and  | | (Bk-\ilYgk  !|<Afc-,  here  the  superscript  -r  denotes  the  general¬ 
ized  inverse  and  denotes  the  most  negative  eigenvalue  of  Bk ■  In  this  case, 
which  More  and  Sorensen  [1981]  refer  to  as  the  "hard  case", 
pfc  =  -(fifc-A1/)+pfc+|fci/fc,  where  vk  is  any  eigenvector  of  Bk  corresponding  to  the 
eigenvalue  \j,  and  £fc  is  chosen  so  that  ||pfc  !!=/!*.  The  lemmas  of  this  section 
will  lead  to  algorithms  that  are  similar  to  this  "optima!"  algorithm  and  have  the 
same  convergence  properties  but  are  considerably  easier  to  implement. 
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The  second  type  oi  trust  region  step  selection  strategy  includes  the  dogleg 
type  algorithms  of  Powell  [1970]  and  Dennis  and  Kei  [1979],  These  algorithms 
are  defined  in  the  case  when  Bk  is  positive  definite  and  always  choose 
V/hen  A*&|| -Bklgk  | { .  pk  is  the  Newton  step  ~Bklgk\  when 


Afcs;  ^  \  \-Bk  lgk  |j.  pk  is  the  steepest  descent  step  of  length  A*;  when 

3kBkgk 


grB  g — '  1 1  ~^k1Sk  !  I ).  Pk  is  the  step  of  length  A*  on  a  specified  piecewise 


linear  curve  connecting  — —■ — - — gk  and  -Bklgk  (see  Dennis  and  Schnabel 

9iBkgk 


[19S3]  for  further  explanation).  The  lemmas  of  this  section  will  lead  to  natural 
and  efficient  extensions  of  these  algorithms  to  the  indefinite  case  which  satisfy 
the  conditions  of  Theorem  2.2  for  second  order  stationary  point  convergence. 


The  first  lemma  gives  a  very  general  condition  on  the  step  at  each  iteration 
that  ensures  satisfaction  of  Condition  ft  1,  and  hence  first  order  stationary  point 
convergence.  By  way  of  motivation  we  note  that  if  an  algorithm  simply  took  the 
"best  gradient  step",  i.e.  the  solution  to 


min{  gkW+%wrBkw  :  ||w 

then  it  would  satisfy  Condition  #1.  Lemma  4,3  is  a  slight  generalization  of  this 
fact. 


Here  we  slightly  change  our  earlier  notation  and  let 

pred(s)=—gTs—  %sTBs. 

Lemma  4.3 

Suppose  there  is  a  constant  cle(0.l]  such  that  at  each  iteration  k, 

pred{pk(&))  st  -  minlglw-h  %wTBkw  :  | \w  j  jsA.«;c[i4]j, 
ior  some  dk  satisfying 


dk9k^~ci  IK  II  lift  II- 

Th2npfc(A)  satisfies  Condition  § 1 ,  and  hence  a  trust  region  algorithm  using  it  is 
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first  order  stationary  point  convergent. 


Proof:  We  will  drop  the  subscripts  fc  throughout  and  will  show  that 


prec£(s«)^  (gr  ||min(A. 


ci  li g 


where  s.  solves  the  above  minimization 


2  11 - || B  .. 

problem.  This  will  clearly  imply  satisfaction  of  Condition  #1  by  p  (A),  since 
pred(p(&))^pred(s.),  by  assumption. 

g 

Define  h(a)=-pred{ad)=agTd  +  2—dTBd.  Then  h‘(a)=adT  Bd+gTd,  and 
h"(a)=dTBd. 

Let  s«=a.c£,  i.e,  a.  is  the  multiple  of  d  which  minimizes  the  quadratic 

gTw+wTBw  along  that  direction,  subject  to  the  constraint  ] \w  |  |^A.  Now.  if 

T  T  A 

dTBd> 0.  then  either  a,=  — :  if  -  -  ^A,  or  else  a»=  .  In  the  first  case 

dTBd  dT Bd  1 1  of  I  i 

we  have 


pred(s.) 

*pred(a.d)=  Bd 


iisji: 


In  the  second  case,  we  have 


IIS 

jored(s«) 


(with  the  inequality  above  true  since 


prei(s->  ‘-wwTi-*  WWdTBd 

A 


I d  ||  drBdi 
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Finally,  if  dTBd^ 0,  a*= 


\d 


>Y*\\ 9  II- 

-  and  so  we  have 


pred{s.) 


li“  i ; 


as-- 


-r-gTd>c  jA  j \g  ||. 


Thus,  s.  and  hence  p(A)  satisfy  Condition  #1.  with  constants  c1=^-and 

M 


£Ti  =  C.. 


We  may  summarize  the  lemma  by  saying  that  as  long  as  an  algorithm  takes 
steps  which  do  as  well  on  the  quadratic  model  as  directions  with  "sufficient"  des¬ 
cent,  then  Condition  #1  is  satisfied,  and  hence  the  algorithm  is  first  order  sta¬ 
tionary  point  convergent. 

Using  Lemma  4.3,  wre  can  immediately  note  first  order  stationary  point  con¬ 
vergence  for  a  number  of  algorithms.  The  lemma  can  be  used  to  prove  the  first 
order  stationary  point  convergence  of  most  line  search  algorithms  which  keep 
the  angle  between  the  steps  and  the  gradient  bounded  away  from  90  degrees, 
because  the  step  length  adjusting  strategy  and  step  acceptance  strategy  in  the 
line  search  can  be  shown  to  correspond  to  a  trust  radius  adjusting  strategy  and 
step  acceptance  strategy  allowed  by  Algorithm  2,1,  In  addition,  it  applies  to  any 
dogleg  type  algorithm,  e.g.  Powell  [1970]  and  Dennis-Kei  [1979],  since  these 
algorithms  always  do  at  least  as  well  as  the  "best  gradient  step".  Finally,  we 
note  tnat  the  lemma  applies  immediately  to  the  "optimal"  algorithm,  described 
above,  for  the  same  reason. 

The  next  lemma  says,  roughly,  that  if  each  step  taken  by  the  algorithm 
gives  as  much  descent  as  a  direction  of  sufficient  negative  curvature,  when 
mere  is  one,  tnen  Concuiion  #2  is  satisfied. 


Lemma  4.4 


Suppose  there  is  a  constant  c2fr(0,i]  such  that  at  each  iteration  k  where 
Mtf(**))<0.  we  have  Bk-H{xk)  and 

pred{pk(t))>pred{tk), 

where 


tk=aTgnLin[g[w+  %wTBkiu  :  j|u>  |  j^A,iA>£[gfc]J, 
for  some  qk  satisfying 


qlBkqk^cz\x{H{xk))  [\qk  |j2 
Thenjo*(A)  satisfies  Condition #2. 

Proof:  We  have  just  to  show  that  for  some  c2>0,  pred(ffc)^Cg(-\1(//(xfc))A2,  for  ail 
iterations  with  A l(H(xk))<0.  Again,  we  will  drop  the  subscripts  k. 


Define  w  =  -sgn{gTq) 


il?  II 


-q.  Then 


pred(w)  =  -Jl  —  ~q  T Bq 


2|!g 


Az 


since  qTBq'£cz\l{H(z))  'g  ij2.  So.  since  pred(w)<prad{tk)spr8d(pk( A)),  pk{ A) 


c2 


satisfies  Condition  #2  with  c2=  . 


So,  if  the  steps  taken  by  an  algorithm  satisfy  the  hypotheses  of  both  Lem¬ 
mas  4.3  and  4.4,  then  the  algorithm  is  second  order  stationary  point  convergent. 
For  example,  if  an  algorithm  uses  any  steps  giving  as  much  descent  as 

s=argTnin\g^tu+  %wTBkiu  :  j|io  j  ,qk  ]}, 

where  dk  satisfies  the  requirement  in  Lemma  4.3,  and  qk  satisfies  the  require¬ 
ment  in  Lemma  4.4  when  \i(H(,xk))<0  and  is  0  otherwise,  then  it  satisfies  both 
Conditions  )f  1  and  #2.  One  such  algorithm  is  mentioned  in  Section  5. 

Finally,  we  note  that  Lemma  4.4  applies  to  the  "optimal"  algorithm  (Soren¬ 
sen  [i960]),  since  this  algorithm  always  achieves  at  least  as  much  descent  as  is 


r 


21 


possib.e  in  the  eigenvector  direction  corresponding  to  the  most  negative  eigen¬ 
value  of  H{zk).  Taken  together  with  Theorem  2.2,  the  two  lemmas  prove  that 
the  "optimal''  algorithm  is  second  order  stationary  point  convergent. 

Lemmas  4.3  and  4.4  can  also  be  used  to  show  convergence  of  algorithms 
using  scaled  trust  regions  of  the  form  j  t  :  \\Dkt  '  i  J ,  where  Dt  is  a  positive 
diagonal  scaling  matrix  that  may  change  at  every  iteration.  If  we  are  using  such 
a  scaled  region  to  determine  a  step  otherwise  satisfying  the  conditions  of 
Lemma  4.3.  then  we  are  requiring 


sIC=argminlsTgic+  %srBks  :  | | |^A,  s e[<4]J. 

This  satisfies  the  conditions  of  Lemma  4.3  as  stated  but  with  A  replaced  by 

~i7"~  ’^ien  by  the  Lemma,  Condition  # 1  is  satisfied  with  cx  replaced  by 
\\Bk  I ! 

C, 

-j-r-  and  similarly  for  a1.  The  same  argument  with  Lemma  4.4  shows  that 


Condition  #2  remains  satisfied  with  a  modified  trust  region.  Thus  if  we  require 
that  |! Dk  ||  and  Hi’*-1  jf  be  bounded  for  all  k,  then  the  convergence  results 
from  Lemmas  4.3  and  4.4  also  apply  when  using  such  a  scaled  trust  region.  They 
also  apply  to  steps  using  trust  regions  based  on  other  norms,  such  as  1 1  or  lm. 


The  final  lemma  contains  a  different  set  of  sufficient  conditions  for  a  step 
computing  method  to  satisfy  both  Conditions  #1  and  #2.  These  conditions  are 
related  to  the  step  (4.2)  of  the  "optimal"  algorithm;  however  Lemma  4.5  is 
broad  enough  to  prove  the  second  order  stationary  point  convergence  of  a 
variety  of  algorithms,  including  several  discussed  in  Sections  5  and  6. 


Lemma  4.5 

•'  oppose  Bk=H(xk)  and  /^(A)  satisfies  Condition  #1  whenever  Sup¬ 

pose  further  that  there  exist  constants  c3>l  and  c4e(0,l]  such  that  whenever 
(//  C— "*  /)  <0,  for  some  n*£(-Ai(//(.T*)},e2max(,Ai|,An]I  p..  (A)  satisfies: 


i)  if  A<  \  .-{BkJr(xkI )  xgk  ;  J ,  thenpfc(A)  is  any  step  satisfying  Conditions  #1  and 
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#2; 

ii)  if  A=  i  1  -{3k+ak[)-xgk  jj.  then  (A) =-(#*+ ex*  7)-V*; 

iii)  if  A>  I!  ~(Bk+a.kI)~]gk  ] [ .  then p*(A)  =  -(5*+a*/)-1g*+£g*,  for  some  g*  satis¬ 
fying  qkBkqksci\l(Bk)  i[g*  ||2,  where  fef?  is  chosen  so  that  ]ip*(A)  ||=A  and 
Sjm  {$)=-sgn(q?{Bk  +a kr)~lgk). 

ThenP*(A)  also  satisfies  Conditions  and  #2  whenever  Xj(//(x*))<0.  and  thus  an 
algorithm  using  p*(A)  is  second  order  stationary  point  convergent. 

Proof:  We  will  drop  the  subscripts  k,  and  call  \i~\i(H{xk)).  We  will  first  show 
that  the  step  in  iii)  satisfies  Conditions  #1  and  §2,  and  then  see  from  the  same 
calculation  that  the  step  in  ii)  satisfies  these  conditions. 

If  pyti)=~(B+aI)~lg  +fg,  then  by  simple  algebraic  manipulation  we  have 

that 

predyp'y  A))= 

=  -9Ttiq-{B+oL[)-'g)-yl  ((g  ~{B  +  aI)~lg)TB(tq  ~(B+cxI)~lg) 

~9T(B  +  al)~lg  ~igTq-^qTBq+{q  TB{B+aI)-1g-%gT{B+<xI)-iB(B+a.I)'-ig 

=  %  gr{B  +  aI)~lg  -  £q  rBq  -^qT{B+al)~lg  + 1-|  |  (B+oJ)~lg  1 1» 

^^9T(B+aI)-lg-^^—>\q  \\z-(aqT{B +al)~1g  +  |~)|  [B+aI)~lg  ||z 

=  ^gT(B+cxI)-'g-^-\\(q-{B+aI)-1g  ||2 

+(-fc4X1-fa)gr(Jff+a/)-‘ff+(|-f-^  || (B+aI)~lg  |j2 

feKy^+a/r^  +  ^-xOlb^ll2 

since  the  last  two  terms  in  the  next  to  last  expression  above  are  positive  due  to 
a>-Xj>-e4X,  and  q  T{B+a/)~1g  <0. 


So,  we  see  that 
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pred(p{b))t£  %  3T{B+aI)~lg  +  — A2 
and  since  the  first  quantity  is  positive,  Condition  IfZ  is  clearly  satisfied.  Also, 

pred(p(A))^  ^gT(B+aI)'lg^~^^f  - 

&__i _ lig-U2. 

8(°3+l)  |j5||  1 

with  the  last  inequality  due  to 

II 5+a/  ;'=X„+c^Xn+c3max(|\i|,An)<(cg+l)  ||5  jj. 

So,  Condition  £2  is  also  satisfied. 

Finally,  note  that  in  case  ii),  we  can  take  (=0,  and  the  same  calculations 
yield  satisfaction  of  Conditions  #1  and  #2  by  the  step  in  ii). 

The  value  of  Lemma  4.5  is  that  it  suggests  many  algorithms  that  are  second 
order  stationary  point  convergent  but  are  relatively  efficient  to  implement.  The 
reader  may  have  recognized  that  conditions  ii)  and  iii)  of  Lemma  4.5  just  give  an 
easy-to-impiement  way  to  identify  the  "hard  case"  in  a  second  order  algorithm, 
and  to  choose  a  step  in  this  case.  The  inequality  concerning  qt  in  iii)  says  that 
q-a  must  be  a  direction  of  sufficient  negative  curvature.  The  inequality  concern¬ 
ing  cxk  says  that  v.re  can  overestimate  the  magnitude  of  X1(//(rfc))  by  an  amount 
proportional  to  jj  H{xk)  ||  and  still  achieve  global  convergence.  When  we  are  not 


in  this  "hard  case"  Lemma  4.5  says  that  we  have  great  leeway  in  choosing  the 
stepp,;  .  The  algorithms  of  Section  5  are  mainly  based  on  Lemma  4.5. 


5.  New  Algorithms  That  Use  Negative  Curvature 

In  this  section  we  present  several  idealized  step  selection  strategies  for 
°roblem  1.1  which  use  second  order  information.  The  step  selection  strategies 
are  all  based  on  the  lemmas  of  Section  4  and  so  any  algorithm  that  uses  one  of 
them  within  the  framework  of  Algorithm  2.1  achieves  second  order  stationary 
point  convergence.  They  are  idealized  only  in  the  sense  that  they  may  use  the 
largest  and  smallest  eigenvalues  of  the  Hessian  matrix  and  a  direction  of 
sufficient  negative  curvature  qk  without  specifying  how  these  quantities  are  to 
be  computed.  In  Section  6  we  will  suggest  a  possible  implementation  of  one  of 
these  algorithms,  including  the  computation  of  the  extreme  eigenvalues  and 
negative  curvature  direction  when  required. 

Before  describing  the  step  selection  strategies  we  turn  briefly  to  the  ques¬ 
tion  of  judging  these  strategies.  So  far  we  have  been  concerned  with  conver¬ 
gence  properties.  We  now  consider  two  other  factors,  the  computational  work 
involved  in  calculating  the  step  and  the  continuity  of  the  step  selection  strategy. 
IVe  define  a  continuous  step  selection  strategy  to  be  one  where  the  function 
p[g  .2?, A)  is  a  continuous  function  of  g,B,  and  A.  We  note  that  the  ''optimal"  stra¬ 
tegy  in  Sorensen  [19B0]  has  this  property  except  in  the  highly  unusual  case  that 
the  algorithm  is  at  a  point  x  with  Aj(//(z))=0,  g  orthogonal  to  the  null  space  of 
H(x )  ,  and  |  |//(x)*p  j|  <A.  All  of  the  strategies  to  follow  will  have  the  same  pro¬ 
perty.  except  as  otherwise  noted.  As  for  the  computational  work,  the  algorithm 
we  present  in  Section  6  should  be  quite  efficient  in  terms  of  arithmetic  opera¬ 
tions  required  per  step. 

The  first  step  selection  strategy  shows  how  a  line  search  using  second  order 
information  can  be  extended  to  the  indefinite  case  in  a  natural  way  that  satisfies 
the  conditions  of  Lemma  4.5  and  so  assures  second  order  stationary  point  con¬ 
vergence.  The  strategy  is  related  to  an  algorithm  by  Gill  and  Murray  [  1972]. 
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In  ail  of  the  following,  let  Bk-H{zk). 


Algorithm  5.1  Indefinite  Line  Search  Step 

Let  k»  1,  ks - - : 

machine  s 

a)  When  \}{Bk)^0  and  K2(Bk)^K 
(tc2  is  the  l2  condition  number), 
if  I! -Bk'gk  ||jsA, 
then  pk(B)  =  ~3klgk , 

A 


otherwise  pk  (A)=-  • 


I  -Bk~lgk 


■Bk  9k  ■ 


b)  When  X1(£fc)<0  or  /cz(Bk)>/c,  ak  is 

chosen  such  that  Bk+cxkI  is  positive  definite  and 
K2(Bk+akI)=ic,  andpfc(A)  is  chosen  by 
bi)  if  liC^fc+aj-/)"1^  ||feAor  Xi(5fc>0. 

thenp«(A)=-  ||(St+0tf7)-W||  (ft**/)-*. 

bii)  otherwise, 

pk  (A)=(5jt  +akI)~lgk+$<Jk , 
where  £  and  qk  are  selected  as  in 
Lemma  4.5. 


The  second  order  stationary  point  convergence  of  any  algorithm  of  the  form  of 
Algorithm  2.1  that  choses  its  steps  by  Algorithm  5.1  can  trivially  be  proven  by 
■  ling  Lemma  4.5  combined  with  Lemma  4.3.  Note  that  the  constant  k  that  is 
used  in  Algorithm  5.1  could  easily  be  replaced  by  some  appropriate  interval. 
Also,  in  order  for  the  step  selection  strategy  to  be  continuous  as  discussed 
above,  qk  must  be  a  continuous  function  of  gk  and  Bk- 

The  next  two  step  selection  strategies  are  extensions  of  the  dogleg  strategy 
to  the  indefinite  case.  Algorithm  5.2  shows  how  to  construct  a  dogleg  version  of 


the  "optimal"  algorithm.  It  is  not  implementable,  due  to  its  use  of  the  general¬ 
ized  inverse  and  the  most  negative  eigenvalue  and  corresponding  eigenvector  of 
Bk.  We  include  it  in  order  to  motivate  Algorithm  5.3,  which  is  similar  but  is 
really  implementable,  as  we  shall  see  in  Section  6.  Both  steps  are  easily  seen  to 
satisfy  the  conditions  of  Lemma  4.5,  with  Lemma  4.3  again  applying  to  the  por¬ 
tion  of  the  algorithm  not  specified  in  Lemma  4.5. 

Algorithm  5.2  Indefinite  Dogleg  Step  A 

a)  When \l(Bk)>0, 

pk{X)=argmin\g?w  +  %wTBkw  :  ||tu  ||^A,  ‘wi[-gk,-Bklgk]\. 

b)  When  A1(J9fc)sO, 

bi)  if  gk  is  not  orthogonal  to  the  null  space  of  Bk  -X2/, 
or  | \{Bk-\xlYgk  ||&A, 

ihenpk(b.)=argmin.\gkw+  %wT£kw  :  ||ui  1 1  =A.  'we[-gk.vk]\, 
where  Bkvk  =Xlvk\ 

bii)  otherwise  pk  (A) =-(5*  -Xi /)+<7*  , 

where  £  is  selected  so  that  j  | p*  (A)  1 1  =A. 

Of  course,  the  step  in  a)  could  be  replaced  by  a  usual  dogleg  or  double  dogleg 
step,  losing  only  the  continuity  of  pk( A)  at  X1(5fc)=0.  Also  note  that  minimizing 
the  quadratic  model  over  a  two-dimensional  subspace  involves  performing  the 
"optimal”  algorithm  when  n=2,  or,  equivalently,  solving  one  fourth  degree  poly¬ 
nomial  in  one  unknown,  meaning  that  its  computational  cost  is  negligible. 

The  following  is  the  Indefinite  Dogleg  Step  that  we  propose  in  practice. 
Again,  the  step  a)  for  the  positive  definite  case  could  be  replaced  by  a  normal 
dogleg  or  double  dogleg  step. 

Algorithm  5.3  Indefinite  Dogleg  Step  B 

a)  When  \i(Bk)>0,  do  the  same  as  in  Dogleg  A. 


2? 


t.)  When  A1(^it)sO,  iet  a*  be  chosen  as  in  Lemma  4.0, 

'fk=-{Bk+akI)~lgk,  cmdpi(A)  chosen  by 
bi)  if  ii^fe  II^A.  then 

pk^)=argmin[g^w  +  %vjr&lew  :  | \w  |  \  =A,  u  i[-gk 
bii)  otherwise 

p^(A)=rfc+^c«,  where  £  and  g*  are  selected  as  in  Lemma  4.5. 

The  advantage  of  Algorithm  5.3  is  that  it  is  fairly  easy  and  efficient  to  imple¬ 
ment,  as  we  will  show  in  Section  6,  while  also  being  a  continuous  step  selection 
strategy  that  is  second  order  stationary  point  convergent  ,  and  that  it  approxi¬ 
mates  the  "optimal"  step  selection  strategy  to  some  extent. 

Algorithm  5.4  shows  how  a  simpler  indefinite  dogleg  step  can  be  con- 
s  "acted  that  satisfies  the  conditions  of  Lemmas  4.3  and  4.4  and  so  also  achieves 
second  order  stationary  point  convergence. 

Algorithm  5.4-  Simple  Indefinite  Dogleg  Step 

a)  When  \\{Bk)> 0,  do  the  same  as  Doglegs  A  and  B. 

b)  When  \l(Bk)^0,  let  qk  satisfy 

s/^gk^-C-tA^fc)  i|g*  | ia, 

where  c4  is  a  uniform  constant  for  all  k,  as  in 

Lemma  4.5,  and  gkqk ssO,  and  let 

pk  ( A) = argmin  \g£u;  +  %  w  TBkw  :  [|w  |j=A,  tue[-pfc,g*]{. 

A'g or: trim  5.4  is  not  continuous  as  discussed  above  when  Xi(B*)=0  but  if  g*  is 
reasonably  chosen  this  will  not  be  a  problem,  and  the  algorithm  has  the  redeem¬ 
ing  feature  that  it  may  be  implemented  so  as  to  require  no  matrix  factorizations 
for  most  indefinite  iterations.  However,  Algorithm  5.4  might  require  more  itera¬ 
tions  than  Algorithm  5.3  to  solve  the  minimization  problems.  In  Section  6  we 
propose  an  implementation  of  an  algorithm  that  subsumes  Algorithms  5.3  and 
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5.4. 

Finally,  we  mention  a  slight  generalization  of  the  "optimal”  step  (Sorensen 
[1980])  that  still  leads  to  a  second  order  stationary  point  convergent  algorithm. 

Algorithm  5.5  Variation  of  "Optimal"  Step 

a)  WhenXi(B*)>0,  letpfc(A)  be  the  "optimal"  step. 

b)  When  Xj(i?fc)£0.  let  ak  and  qk  be  chosen  as  in  Lemma  4.5, 

let rk=—(Bk+akI)~lgk,  and 

bi)  if  |[rfc  ll&A,  then pk(b)=argmin[g£w+  %wTBkw  :  |[ui  j|=A{; 
bii)  otherwise pk (A)  =rk+$qk,  where  £  is  chosen  so  that  \\pk  i|=A. 

This  step  differs  from  the  "optimal"  step  in  that  it  uses  ak,  not  necessarily  a 
close  estimate  of  the  most  negative  eigenvalue,  in  identifying  the  hard  case,  and 
that  it  just  uses  the  direction  of  negative  curvature  qk  in  this  case,  not  neces¬ 
sarily  an  eigenvector  corresponding  to  the  most  negative  eigenvalue.  This 
makes  it  considerably  more  efficient  to  implement  in  the  hard  case.  The  second 
order  stationary  point  convergence  follows  obviously  from  Lemma  4.5. 
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6.  Aa  Implementation  or  the  Indefinite  Dogleg  Algorithm. 

In  this  section  we  will  always  use  ft -H{xk). 

Now  we  present  one  possible  implementation  of  the  step  selection  strategy 
in  Algorithm  5.3  ,  both  as  an  example  of  the  sort  of  algorithm  the  theory  has 
been  aimed  at,  and  as  partial  justification  that  such  algorithms  can  be  efficiently 
implemented. 

Our  implementation  differs  from  More  and  Sorensen’s  [1981]  in  that  it  uses 
explicit  approximations  to  the  most  negative  eigenvalue  Xi  and  corresponding 
eigenvector  vt.  We  claim  that  this  approach  may  well  be  more  efficient.  The 
bulk  of  the  computational  work  in  most  optimization  algorithms,  aside  from 
function  and  derivative  evaluations,  is  made  up  by  matrix  factorizations.  In  our 
implementation  there  is  the  additional  work  involved  in  obtaining  the  approxi¬ 
mations  to  the  largest  and  smallest  eigenvalues  and  the  most  negative  eigenvec¬ 
tor.  Computational  experience  shows  that  a  good  algorithm  for  this,  e.g.  the 
Lunczos  method,  can  obtain  approximations  to  outer  eigenvalues  and  eigenvec¬ 
tors  of  a  symmetric  matrix  with  guaranteed  accuracy,  with  fewer  operations 
than  one  matrix  factorization.  According  to  Parlett  [1980],  the  Lanczos  algo¬ 
rithm  usually  requires  0(nSs)  or  fewer  arithmetic  operations.  Thus,  calculating 
the  desired  eigen-information  explicitly  may  not  introduce  a  significant  addi¬ 
tional  cost. 

Figure  6.1  below  contains  a  diagram  of  our  proposed  implementation  of 
Algorithm  5.3.  This  implementation  includes  estimation  of  the  extreme  eigen¬ 
values  and  the  corresponding  eigenvectors  of  ft .  This  would  only  be  done  at  the 
first  minor  Iteration  of  each  major  (k-th)  iteration.  If  additional  minor  iterations 
were  required,  at  this  major  iteration,  the  necessary  eigen-information  would 
already  be  known  and  so  one  would  immediately  calculate  the  step  in  part  a)  or 
b)  of  Algorithm  5.3. 
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In  two  places  in  Figure  6.1  there  are  "attempted  Choiesky  decompositions", 
of  Bk  and  Bk  +  al.  These  algorithms  are  given  in  Gill,  Murray,  and  Wright  [19B1] 
or  Dennis  and  Schnabel  [1963].  If  the  matrix  is  numerically  positive  definite,  the 
factorization  algorithm  calculates  the  LLT  factorization  of  the  matrix.  If  it  is 
not  numerically  positive  definite,  the  factorization  algorithm  returns  a  lower 
bound  Xi  on  the  most  negative  eigenvalue  of  the  matrix  and  a  direction  of  nega¬ 
tive  curvature  v  for  the  matrix  (i.e.  for  Bk  or  Ek+al,  respectively).  The  factori¬ 
es 

zation  algorithm  requires  about  — —  multiplications  and  additions  in  all  cases. 

D 

Since  the  Lanczos  algorithm  is  restarted  using  this  direction  v,  the  Xi  that 
results  from  the  next  use  of  the  Lanczos  algorithm  at  the  same  iteration  must 
be  smaller  than  the  curvature  of  v .  Thus  in  particular,  the  Xj  resulting  from  the 
Lanczos  algorithm  can  be  positive  only  if  2?*~i  was  not  positive  definite  and  one 
is  going  through  the  left-hand  loop  of  Figure  6.1  for  the  first  time  in  the  k-th 
iteration. 

A  possible  choice  of  a  in  Figure  6. 1  is 

w®(0,\i)  „ 

a:= - Xi 

e 

where  efcVmachinee.  If  Bk+al  is  positive  definite  and  step  bii)  is  required,  v 
almost  certainly  will  satisfy  the  conditions  on  qk  in  Lemma  4.5;  this  may  be 
tested  using  -a  which  is  a  lower  bound  on  Xj (£?*)•  It  is  theoretically  possible 
that  additional  iterations  of  the  Lanczos  procedure  would  be  required  to  find  a 
satisfactory!;  in  this  case. 

Figure  6.2  shows  how  our  implementation  of  Algorithm  5.3  given  in  Figure 
6.1  can  be  modified  to  sometimes  substitute  the  simpler  step  b)  of  Algorithm  5.4 
for  step  b)  of  Algorithm  5.3,  when  Bk  is  not  positive  definite.  A  lower  bound  X<  on 
Xj(5*)  is  always  available,  initially  from  the  Gerschgorin  theorem,  and  subse¬ 
quently  from  the  failed  Choiesky  decomposition.  If  the  negative  curvature  direc¬ 
tion  v  from  the  Lanczos  algorithm  satisfies  the  condition  of  Lemma  4.5  for  qk, 
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Figure  S.  1 

An  implementation  of  the  step 
selection  strategy  of  Algorithm  5.3. 
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uiiryg  ihis  oouiiu  <v  in  p.ace  oi  Aui4).  uwa  step  b)  of  Algorithm  5.4  may  be 
Iu.it*.  L  w:ie  ccus'm-.c  1.  ..J  m  i.^jvu..a  4.5  is  ...I'-i-on  s».v.ali,  the  first  v  probably  will 
,«at..?.?y  tnc  condition  {•■  ir-nmia  4.o.  A  step  b)  oi  Aigo. ,.,:m  5.4  is  taken  as  soon 
as  A  is  possible,  the  step  selection  strategy  of  figures  6.1  and  6.2  may  require 
n  „  matrix  factorizations  when  Bk  is  not  positive  definite.  Another  alternative  is 
to  take  this  step  only  A  some  fixed  number  of  Cholesky  decompositions  have 
fa-iod,  say  two. 

Trie  implementations  in  Figures  6.1  and  6  2  strive  to  minimize  the  number 
of  matrix  factorizations.  When  Bk  is  positive  definite,  only  one  factorization  will 
be  needed,  in  addition  the  Lanczos  work  will  be  required  only  if  Bk-i  was  not 
positive  definite.  When  3k  is  not  positive  definite,  the  algorithm  will  perform 
Lot  . eon  zero  and  n  factorizations,  usually  between  0  and  2  or  3.  When  the  step 
i;t  Figure  6.2  is  taken  on  the  first  iteration,  no  factorizations  are  needed.  Gen- 
e.'-’.'.y  the  Lanczos  algorithm  will  yield  a  good  enough  approximation  to  Aj (Bk) 
that  the  first  a  will  yield  a  positive  definite  Bk  +a/,  and  thus  only  one  factoriza- 
f  c~.  will  be  required  in  the  indefinite  case.  In  certain  rather  pathological  cases, 
the  Lanczos  algorithm  can  tend  to  converge  not  to  the  smallest  eigenvalue  but 


Figure  6.2 


Optional  augmentation  with  the  step  selection  strategy  of  Algorithm  5.4. 


to  a  larger  one,  in.  which  case  the  Cholesky  factorization  will  fail.  Then,  the  algo¬ 
rithm  will  \i=e  the  direction  of  negative  curvature  from  the  Cholesky  failure  as  a 
starting  vector  for  the  Lanczos  process,  which  guarantees  that  the  Lanczos  algo¬ 
rithm  will  converge  to  a  smaller  eigenvalue  than  the  last  one.  Thus,  although  we 
cipcct  only  one  factorization  to  be  required  in  the  indefinite  case,  it  is  possible 
■'  \t  several  may  be  needed,  but  never  more  than  n. 

In  summary,  this  implementation  will  require  one  factorization  on  all  posi- 
1.1  cn  definite  Hessian  matrices,  and  most  indefinite  ones.  In  addition,  when  Bk  is 
rr.t  p:s:tiva  definite  It  will  require  the  work  involved  in  the  Lanczos  process, 
vhieh  is  likely  to  be  considerably  less  than  the  work  of  one  factorization  when  n 
m  large.  The  implementation  satisfies  the  requirements  of  Lemmas  4.3  and  4.5, 
and  hence  a  computer  code  using  this  step  in  the  framework  of  Algorithm  2.1  is 
second  order  stationary  point  convergent.  Of  course,  by  Theorem  2.2  it  is  also 
meal  .  c-quadraUcaily  convergent.  Tne  techniques  in  Figure  6.1  could  also  be 
employee  in  the  implementation  of  other  step  selection  strategies,  in  particular 
the  indefinite  line  search  step  given  in  Algorithm  5.1  or  the  modified  "optimal" 
step  given  in  Algorithm  5.3,  leading  again  to  implementations  that  are  second 
order  stationary  point  convergent. 
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